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Abstract. We investigate fast direct methods for solving systems 
jy^ ■ of the form (B + G)x = y, where B is a limited-memory BFGS 

matrix and G is a symmetric positive-definite matrix. These sys- 
tems, which we refer to as shifted L-BFGS systems, arise in sev- 
eral settings, including trust-region methods and preconditioning 
techniques for interior-point methods. We show that under mild 
assumptions, the system {B + G)x = y can be solved in an effi- 
cient and stable manner via a recursion that requies only vector 
inner products. We consider various shift matrices G and demon- 
strate the effectiveness of the recursion methods in numerical ex- 
periments. 

1. Introduction 

This paper proposes a recursion formula for solving symmetric positive- 
definite shifted limited-memory BFGS (L-BFGS) systems of equations, 
i.e., equations of the form 

(B k + G)x = y, (1) 

where B k is a L-BFGS matrix and G is a symmetric positive-definite 
matrix such that (i) the smallest eigenvalue of G is bounded away from 
zero, and (ii) solves with G + al, where a > 0, are efficient and stable. 

Systems of the form (CD) arise in both unconstrained and uncon- 
strained optimization. In trust-region methods for unconstrained opti- 
mization, the j'th two-norm trust-region subproblem is given by 

minimize Q(s) = qJs-\ — s T Hj.s subject to llslU < 5j, (2) 

where gj = f(xj), Hj = V 2 /(xj), and Sj is the trust-region radius. L- 
BFGS quasi-Newton trust-region methods approximate Hj with an L- 
BFGS quasi-Newton matrix Bj (e.g., [191 QSl HI Q31 E])- In this 



Date: September 25, 2012. 

Key words and phrases, numerical linear algebra, quasi-Newton methods, 
Sherman-Morrison- Woodbury formula, limited-memory BFGS. 

This work was supported in part by NSF grants DMS-08-11106 and DMS-09- 
65711. 

1 



2 



JENNIFER B. ERWAY, VIBHOR JAIN, AND ROUMMEL F. MARCIA 



context, s* is a global solution to the trust-region subproblem if and 
only if ||s*|| < 5j and there exists a unique a* > such that 

{Bj + a*I)s* = -g, and a*(Sj - \\s*\\) = 0. (3) 

Since Bj is symmetric positive-definite, the system matrix in fl3]) is 
symmetric positive-definite, and thus, the matrix-vector equation is an 
example of a shifted L-BFGS system of the form (CQ). In small-scale 
optimization, trust-region methods use matrix factorizations to find a 
pair (s*,a*) that satisfy Q; in particular, the More-Sorenson direct 
method, arguably the best direct solver, makes use of Cholesky factor- 
izations of the shifted (approximate) Hessian to find a global solution of 
the trust-region subproblem [15]. Being able to efficiently solve shifted 
L-BFGS systems enables the use of direct methods such as the More- 
Sorensen direct method for large-scale unconstrained optimization. 

In constrained optimization, shifted L-BFGS systems result when 
preconditioning primal-dual penalty and interior-point methods; more 
generally, these systems can arise in the context of KKT systems or 
saddle-point systems. For example, consider the following system of 
equations 



A D I \x 2 \b 



(4) 



where A is an m x n matrix, H is symmetric, and D is symmetric 
positive definite. Systems of this form are often called "KKT systems" 
or "saddle-point systems". The equivalent doubly- augmented system [7J 
is 

( E + 2A T D~ 1 A A T \ f Xl \ (b x + 2A T D- 1 b 2 , 



A D J \x 2 

and arise in the Newton equations for primal-dual augmented La- 
grangian methods (see, e.g., [61 El HO]). Systems of the form ()5]) also 
arise in the Newton equations associated with primal-dual interior- 
point methods (see, e.g., [HI [HI El E] ) - In both applications, typically H 
is the Hessian of the Lagrangian, A is the constraint Jacobian, and D 
is a positive-definite diagonal matrix, which serves as a regularization 

mm)- 

If the matrix H + 2A T D X A is positive definite, the system matrix 
in ^ is positive definite [7|; thus, preconditioned conjugate-gradients 
(PCG) may be used to solve (0). Forsgren et al. [7] recommend a block 
preconditioner of the form 



P 



B + 2A T D- 1 A A T 
A D 
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where B is an approximation of H such that B + 2A T D~ 1 A is positive 
definite. One benefit of this preconditioner is that efficient solves with 
P can be computed provided solves with B + 2A T D~ 1 A are efficient. 
To see this, note that 

(B + 2A T D~ 1 A A T \ U 
\ A D ) \v 2 

is equivalent to first solving {B + A T D~ 1 A)vi = n — A T D~ 1 r 2 , for v\ 
and then directly computing v 2 = D~ 1 (r 2 — Av\). In the case when A 
is a constant positive-definite diagonal matrix (e.g., the constraints are 
simple bounds) , solves with P are efficient whenever solves with B + G 
are efficient, where G is a symmetric positive-definite diagonal matrix. 
In large-scale optimization, L-BFGS matrices are a common choice for 
approximating matrices of unknown structure. If B is taken to be an 
L-BFGS approximation to H, then the resulting system to be solved is 
a shifted L-BFGS system. 

In this paper we investigate fast direct methods for solving shifted 
L-BFGS systems where the shift G is a symmetric positive-definite ma- 
trix. Recent work has developed formulas for the case when G is a 
diagonal matrix [3j 0]; however, no stability proof was given for the 
proposed recursion formulas. In this paper, we derive recursion formu- 
las for the cases when G is sufficiently positive-definite (i.e., the smallest 
eigenvalue of G is bounded below) and solves with G + al, where a 
is a constant, are efficient and stable. An important contribution of 
this paper is a stability proof for the proposed recursion formulas that 
includes the case when G is a positive diagonal matrix. Moreover, we 
present a bound on the condition number of the shifted L-BFGS matrix 
B k + G. 

This paper is organized in six sections. Section 2 is a review of L- 
BFGS updates, including the famous two-term recursion formula [16] 
for solves with the L-BFGS matrix. Section 3 introduces shifted L- 
BFGS systems and reviews the recursion formula for shifted L-BFGS 
systems. In Section 4, we present stability results and bound the con- 
dition number of the shifted systems. We demonstrate the effectiveness 
of the recursion methods in Section 5. Future research directions and 
conclusions are found in Section 6. 




2. Limited-memory BFGS matrices 
In this section, we review L-BFGS matrices and their updates. 
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The BFGS matrix is defined by a sequence of pairs of updates { (sj, yi) } 
as follows: 

Si = x i+1 - Xi and y { = Vf(x i+1 ) - V/(x;). (6) 

The initial BFGS matrix is taken to be a scalar multiple of the iden- 
tity, i.e., Bo = 7 fc ~ 1 /. For each pair (si,yi), the quasi-Newton matrix is 
updated as follows: 

B i+1 = Bi- didf + bibj, (7) 



where 



BiSi yi 
and bi 



Each BFGS matrix Bi is symmetric positive definite. In practice, the 
initial matrix B is often taken to be be 7^ = Sfc_i2/&-i/||z/fc-i||2> where 
k is the index of the last stored pair (see, e.g., [H] or [T6]). 



In large-scale optimization, it is advantageous to store only a few of 
the most recent pairs i.e., typically less than ten (Byrd et 

al. [1] recommend between two and six). In this case, the BFGS matrix 
is referred to as a limited-memory BFGS (L-BFGS) matrix. For this 
paper, we consider L-BFGS matrices where M denotes the maximum 
number of stored L-BFGS updates. 

One advantage of L-BFGS updates is that for solving systems of the 
form B^x = r there is a two-loop recursion formula, which is presented 
in Algorithm IO[T6l[T7]. 



Algorithm 2.1: Two-loop recursion to compute x = B^ r. 

q ^— r; 

for i — k — 1 , . . . , 

^ (sfq)/(yfsi); 
q- anyi] 

end 

x B^q; 

for i = 0, . . . , k — 1 

<- (ylx)/(yfsi); 

x <— x + (ojj — f3)si\ 
end 



To solve shifted L-BFGS systems, one might be tempted to make a 
simple change to Algorithm 12.11 particularly by replacing x 4— B^ l q 
with x <— (Bo + G)~ l q in between the two loops. However, this does 
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not yield a solution to (B^ + G)x = r. It is easiest to see this by 
considering the case of only one update. Let Bq = (Bq + G). Then the 
two- loop recursion where the x <— B ~ 1 q step is replaced by x 4— B ~ 1 q 
would yield a solution that satisfies x = B^ x r, where 



B\ = Bq — aoaQ + b^b^ with ao 



B s 



Sq B S 



(see i.e., 

B, = (B + G) - aoal + b Q b T ± (B + G) - a a^ + b b^ = (B 1 + G). 

Thus, x satisfies B\x = r; however, x does not satisfy the shifted L- 
BFGS system, i.e., [B 1 + G)x ^ r. 

For the duration of the paper we assume that at most M pairs 
{(si,yi)}, i = 0, ...M — 1 are stored at any given time. Moreover, 
we assume the usual requirement that a pair (sj, yi), i = 0, . . . M — 1, 
must satisfy yfsi > in order to guarantee that each Bi is positive 
definite. For k < M, the kth vectors Sk and y^ are stored as the Arth 
column in S and F, respectively. We use Algorithm 2.2 to update the 
matrices S and F as new pairs (sk, yk) are generated; thus, at all times 
we have exactly k stored vectors with k < M — 1. 



Algorithm 2.2: Update S and F. 

if k < M- 1, 

5 s fc ]; F^[F y fc ]; fc<-A; + l; 

else 

for i = 0, . . . fc — 1 

^- Si+i; Vi <— j/t+i; 

end 

5 <- [s , . . .Sfc_i]; F «- [j/o, • • -Vk-i]; 
end 



3. Shifted L-BFGS Methods 

Consider the problem of finding the inverse of Bj, + G, where B^ 
is an L-BFGS quasi-Newton matrix and G is a symmetric positive- 
definite matrix. Let # max and # m i n be bounds on the eigenvalues of 
G, i.e., < 9 min < X(G) < 9 max . We assume that solving systems 
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involving G + al where a is a scalar can be done in an efficient and 
stable manner. 

The Sherman-Morrison- Woodbury (SMW) formula gives the follow- 
ing formula for computing the inverse of A + uu T , where A is an n x n 
symmetric and invertible matrix and uu T is a symmetric rank-one up- 
date (see [IT]) with u eW 1 : 

{A + uu T y l = a- 1 1— r 1 ^- 1 

v ' 1 + u A u 

= A ~ l Vr-, — t^A-VA' 1 . 

1 + trace(v4 l uu 1 ) 

We now use the SMW formula for computing the inverse of Bj, + G. 
Let the matrices E 2 % = —aiaj and E 2 i+i = b{bf the rank-one updates 
in (jSJ, and define the matrices 

Co — (G + 7 fc 1 /) , G\ = C + E , • • ■ , C 2k = C 2 k-i + E 2 k-i- 

(9) 

We note that C 2i = Bi + G. We compute (B k + G) 1 z by noting that 
C 2 fc = B k + G and applying the SMW formula to 2 recursively: For 
< % < 2k, 

C~ + \z = C-h - 1 Cr^C-h, (10) 

1 + trace (Cj Ei) 

and thus, recursively applying ( jTOi) to Cf z, we obtain 



2fc~l 

C- 1 ^ = C^z - = — -C^EiC^z. fir 

1 + trace(Cf\E^ 



i=0 



We now show that f lTT]) can be computed efficiently using only vector- 
vector inner products. Let r» = 1/(1 + trace(Cj rl i?j)), and let 

{C~ x a% if z is even 
C-'bL if i is odd. (12) 

Then {C~ x EiC~ x )z = (pipf)z = (j>fz)pi, and ( fTTi) simplifies to 

2k-l 

C^z = C^z+Y^i-lYntfz)*. (13) 

i=0 

We assume that Cq -1 ^ = (G + 7^ is easy to compute. Therefore, 

the bulk of the computational effort in forming C k ~ 1 z involves the inner 
product of z with the vectors Pi for i = 0,1, ... ,k. What remains to 
be shown is how to compute and Pi efficiently. 
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First, we note that using ( fl2l) . 



trace^" 1 ^) 



Thus, Tj simplifies to 



1 



2 
2 



if i is even 
if i is odd. 



1 - a\Pi 

2 

1 

1 + btiPi 



if i is even 
if i is odd. 



Finally, notice that we can compute pi in (TT2]) by evaluating ( TT5|) at 

2 = a. or z = : 



C 1 a 1 + (— l) J Vj (pja ^ )pj if i is even 



Pi = < 



j=0 
i-l 



C + ^^(— l) J rj(pj6i^i if i is odd. 



j=0 



Thus, computing and storing a T L pj and b 1 i_ 1 pj enables us to easily com- 

2 2 

pute Tj and pj. Algorithm 3.1 computes in ([9]) using (fiTj) and 



13): 



Algorithm 3.1: Proposed recursion to compute x = C 2t }r = 
{B k + G)- l r. 

for j = 0, . . . , 2k - 1 
if j even 
c <- Oj/ 2 ; 

else 

c <- & c?-i)/2; 

end 

for i = 0, . . . , j — 1 

Pi <~Pj + (-!)^i(pfc)pi; 

end 

^•^l/(l + (-l)^VJc); 
x <- x + (— l) 5+1 u/(pjz)pj; 
end 
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4. Stability 

It is well-known that the SMW formula for inverting a rank-one 
update to a nonsingular matrix can be numerically unstable (see, e.g., 
[12l l20l |21~]). In this section, we address how this potential instability 
in our proposed recursion approach is mitigated. To show that the 
proposed recursion approach in computing C^ +1 z in (j5J) is stable, we 
first show that 1 + trace (C~ 1 E i ) is sufficiently bounded away from zero. 

The potential source of instability is in the computation of rf. 



1 



if i is even 



Ti = r —, r- = { 2 - 7 (14) 

1 + trace (Q EA — c -i b . - if i is °dd. 



When i is odd, there is no instability because the denominator in ( 1141) 
is bounded away from zero since Cj is positive definite: 

1 + trace(Cr 1 ^) = 1 + bUC^b^ > 1. 

However, when % is even, subtraction in the denominator of (I14j) could 
cause catastrophic cancellation. To show that the proposed recursion 
formula is stable, we prove that the denominator in (j!4p is bounded 
away from zero when % is odd. To do this, we first compute bounds on 
the largest eigenvalue of Cj for every i with < i < 2k — 1. 

Theorem 1. Let i = 2j for some j G N, and let 

C i+1 = Ci- djdj and C i+2 = C i+1 + bjbj, 

where aj and bj are defined in (jSJ), {C{\ are defined as in (Q, and 
< #mm < -MG) < Om&x- Then the eigenvalues of Cj+i anc? Cj + 2 ore 
bounded above and below by the following: 

1 j IMI2 

#min < A(Cj + l) < 7^ + + ^ 



3+1 



#min < A(Ci +2 ) < 7fc 1 + + ^ 



lygjjj 



1=0 

Proof. Recall that for any square symmetric matrix A, 

A m in(^4) = rnm z T Az and A max (^4) = max z T Az. 

||z||2 = l ||z||2 = l 



SHIFTED L-BFGS SYSTEMS 



Henceforth, we assume that z G M n is any arbitrary vector with ||z||2 
1. Then 

z T C i+1 z = z T (b, - sT l B s BjSjsjBfj z + z T Gz 



\B^ z \\i-^ \rr 3,) +* t gz 



3 112 \\By%wi 



= \\B)' 2 z\\l(l-cos 2 (/.{B)' 2 z,B)%)))+z T Gz 
~> ft 

which implies that A m i n (Cj + i) > 9 m - m . Since Cj + 2 = -Bj+i + G, Bj+i is 
positive definite, and A min (G) > min , it follows that 

Z T Ci+2 z = Z T Bj + iZ + z T Gz > # m i n . 

Therefore, A min (C i+2 ) > min . 

Next, we compute upper bounds on A max (Cj + i) and A max (C i+2 ). 
First, we consider an upper bound for A max (-Bj+i) : 



z T B j+1 z = z T (B j - 7 ^—B J s,sjBj + -^-y j yj)z 
= (z T B jZ ) (l - cos 2 (Z(5]/ 2 ,, B)/%))^ + M 



T 



< z BjZ 



\\vj\\ 



2 



VJ S 3 

since \yjz\ < \\yjW2WzW2 cos(Z(yj, z)), which is maximized at z = yj/\\yj\\2, 
and thus, \yjz\ < 1 1 3/^ 1 1 2 - Applying this recursively and setting B = 
7^" 1 /, obtain 

A max (s, +1 ) < 7 -i + £%!!!. (15) 



Since 



z T C i+ iz = z T ^Bj jt~ BjSjsjBj^j z + z T Gz 

= (z t B jZ ){1 - cos 2 {Z(Bfz, B)%))) + z T Gz 
< A max (5j) + A max (G), 
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then by ([H]) 



3-1 ||^||2 

A m ax(Cj+l) — Ik + 

max • 

„ — ' yi S£ 

Similarly, z T C i+2 z = z T (B j+1 +G)z < A max ( J B j+ i)+A max (G). Therefore 

3 11^ 112 

A m ax(Cj+2) < 7fc + / , T ^ $max; 

which gives the desired bounds. □ 

Note that the bound on A max (Cj +3 ) is also a bound on A max (Cj +2 ) since 
yfsi > for I — 0, . . . , k — 1. Theorem [T] allows us to bound the 
condition number of Ci + \ and Ci +2 in the following corollary: 

Corollary 1. If we let Yi — [ yo y\ ■ ■ • yi ] G K nx (* +1 ) and require 
that yjsi > S for some 5 > for each £, then the condition numbers of 
Cj + i and Ci + 2 are bounded by the following: 



'mm 



'mm 



Now we demonstrate that the denominator in ( !T4|) is bounded away 
from 0. 

Lemma 1. Let Yj — [ y yi ■ ■ ■ y,j ] E M. nx C3+ 1 ). Suppose that yjsi > 
5 > for I = 0, . . . , k — 1, then for i = 2j , 

l-aTCfV > 9a f > 0. 

Proof. Using the definition of aj given in (JHJ), 

aJCr\ = -^-(sjB^ + Gy^sX (16) 
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1 Icy 

Letting q = Ba sj in (116H . we obtain 



q T (I + BT 1/2 GBT 1/2 )-\ 




q T q 


^max 1 


{{I + BJ l/2 GBJ l/2 ) 




1 


Amin | 


[l + B^GBJ^ 




1 



< 7 \~ (17) 



1 + A min ( Bj X I 2 GB^ 1 ^ 2 



since A min (/ + BT 1/2 GBJ 1/2 ) > A min (J) + \ mhi {BJ 1/2 GBJ 1/2 ) by, for 
example, [HI Theorem 8.1.5]. Note that 



i i 



Amin B- 2 GB,j = mm J —=, — = mm > tttt-^ 



x^o x T x y^o y T B j y A max ( J B j ) 



Then, flT7]) together with flTg]) and (US} yields 

1 qFC~^&' ^min ^ ^min 

Finally, since y^s; > <5 > for / £ {0, . . . , k}, we obtain 

l-ajcrv > — „ 6m 'T 9 , > 0, 

as desired. □ 

The following theorem shows that computing C~^ ± r is stable. 

Theorem 2. Algorithm 3.1 for computing C~ l r is stable for i = 
1, 2, • • • , 2k, provided solves with G + al are stable for a > 0. 

Proof. The proof is by induction on i. For the base case i — 1, let 
f = C ' 1 r and a = C " 1 ao; then, by ffTOl) we have 

CfV = C^r-- Q' 1 a og'C^ 1 r = f-- ^(a^f)a. 

1 — ag C a 1 — ao a 

Substituting in for ao using (JSJ) together with Bq = 7 A T 1 / yields 

r^i _ tfBoiG + tfl^BoSo _ sj \G + % x s 1 
a o°o a o — t75 — f — a — < l - 

Sq BqSq 7fc S 1 + JkO m in 
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Therefore, 1 — clqCq 1 ^ > 0, and C~ l r is computed in a stable manner. 

For the induction step, we assume that computing C~ l r is stable. 
Then we need to show computing C^r is stable. Since 

C7 + \r = Cr\ - - 1 — T —Cr 1 E l Cr\ (19) 

+ 1 + trace(C i ~ 1 J E J ) 

we only need to show that the second term can be computed in a stable 
manner. Using Lemma dj 

0„ 



'nun 



if i is even 



1 + trace(C , r 1 E i ) > { % l + ll*i-i|IIA* + , (20) 

if % is odd 

which implies that 1 + trace(Cj~ 1 £'j) is bounded away from zero, and 
thus, the denominator in ( 1X91) is bounded away from zero. By assump- 
tion computing C~ 1 r is stable; therefore computing C^ 1 Efi^ r is sta- 
ble. Thus, computing C^r is stable, which completes the proof. □ 

Note that it is possible for in the lower bound in (120]) to 

become very large. To mitigate this problem, can be used as 

a criterion for restarting the L-BFGS method. 

5. Numerical experiments 

We demonstrate the effectiveness of the proposed recursion formula 
for solving shifted L-BFGS linear systems of varying sizes, from n = 10 3 
to n — 10 6 . We used M = 5 as the number of L-BFGS updates, in line 
with recommendations in pp. The proposed method was implemented 
in MATLAB and was compared to a direct method using the MAT- 
LAB "backslash" command and the built-in conjugate- gradient (CG) 
method (pcg.m). Because of limitations in memory size, we were only 
able to use the direct method for problems where n < 10, 000. We 
report the relative residuals computed by each of the methods. For the 
convergence criteria for CG, we used the relative residuals from the re- 
cursion formula's results. This means that the time reported for CG is 
how long it takes for CG to achieve the same accuracy as the proposed 
recursion method. We note that preconditioning was not applied in 
pcg.m in our numerical experiments since it was not clear which pre- 
conditioning matrices would be effective. However, we believe that the 
performance of CG would improve if such effective preconditioners were 
to be found and used. In previous work [31 H], we presented results for 
when G = a I with a > and G = D, where D is a diagonal ma- 
trix with diagonal elements > a > 0. Here, we consider two other 
symmetric positive-definite shifts: tridiagonal and circulant matrices. 
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n 


Relative Residual 
Direct CG Recursion 


Time 

Direct CG Recursion 


1000 
2000 
5000 
10000 


1.6e-15 7.0e-16 9.0e-16 
1.6e-15 3.3e-16 3.4e-16 
2.4e-15 1.8e-16 2.9e-16 
4.8e-15 6.3e-16 6.4e-16 


0.0185 0.0114 0.0029 
0.1265 0.0168 0.0043 
1.0453 0.0359 0.0094 
6.7182 0.0664 0.0169 



Table 1. Numerical results for small problems where 
the shift is a tridagonal matrix. Here, the size of the 
matrix varies between n = 1,000 and 10,000 (Column 
1). Columns 2-4 are the relative residuals of the solutions 
obtained from using a direct method, conjugate gradients 
(CG), and the proposed recursion method, respectively. 
Columns 5-7 are the corresponding computational times. 
The matrices were generated randomly. 

For the tridiagonal matrices, the diagonal elements [Gtrf] i,i = 2 + 
a + g^i, where is randomly chosen from a uniform distribution be- 
tween and 1 and a > 0. The off- diagonal elements are given by 
[Ctri]i,i+i = [Gtri]i+i,i — 9i,ji where g^j is randomly chosen from a uni- 
form distribution between —1 and 0. The a on the diagonal is added 
to ensure that G tri is positive definite. In our numerical experiments, 
we chose a — 0.1. For the circulant matrix, G C i rc , we enforce positive 
definiteness by requiring that G C irc be diagonally dominant. In our 
numerical experiments, we randomly generated the first row of G C i rc , 
let the diagonal element be the sum of the non-diagonal row elements 
plus a = 0.1, and generated the matrix G circ accordingly. Computing 
(G + 7^ 1 /)~ 1 c in the recursion algorithm (Algorithm 13. II) is done using 
the discrete Fourier transform (see [TTJ Equation (4.7.10)]). 

Results. The three methods (the direct MATLAB "backslash" com- 
mand, CG, and the proposed recursion) were run on shifted L-BFGS 
matrices whose shifts were tridiagonal and circulant for varying prob- 
lem sizes (10 3 < n < 2 x 10 6 ). Tables 1-4 report results of the randomly 
generated test problems; they are typical of what we have seen from 
many runs of the methods. Table 1 shows the relative residuals and the 
computational time on smaller problems for the three methods for solv- 
ing shifted L-BFGS systems with tridiagonal shifts. All three methods 
are able to compute accurate solutions. As expected, the direct method 
requires more computational time than CG and the recursion method 
since both CG and the recursion method do not form the shifted matrix 
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n 




Relative 


Residual 




Time 






CG 


Recursion 




CG 


Recursion 


50000 


1 


08e-15 


6.44e-16 





24666 





079500 


100000 


2 


53e-16 


3.18e-16 





50530 





158202 


200000 


6 


49e-16 


7.37e-16 





81331 





458348 


500000 


1 


53e-15 


1.52e-15 


2 


62797 


1 


270633 


1000000 


5 


25e-16 


3.38e-16 


9 


98859 


2 


514294 


2000000 


1 


55e-15 


8.94e-16 


23 


77123 


4 


950696 



Table 2. Numerical results for large problems where 
the shift is a tridagonal matrix. Here, the size of the 
matrix varies between n = 100,000 and 2,000,000 
(Column 1). Columns 2-3 are the relative residuals of 
the solutions obtained from using conjugate gradients 
(CG) and the proposed recursion method, respectively. 
Columns 4-5 are the corresponding computational times. 
The matrices were generated randomly. 



n 


Relative Residual 
Direct CG Recursion 


Time 

Direct CG Recursion 


1000 
2000 
5000 
10000 


l.le-15 2.9e-16 4.9e-16 
1.5e-15 3.4e-16 4.6e-16 
2.6e-15 5.6e-16 7.1e-16 
3.5e-15 5.8e-16 7.3e-16 


0.0185 0.0088 0.0027 
0.1300 0.0117 0.0039 
1.0251 0.0215 0.0085 
6.7091 0.0311 0.0162 



Table 3. Numerical results for small problems where 
the shift is a circulant matrix. Here, the size of the ma- 
trix varies between n = 1,000 and 10,000 (Column 1). 
Columns 2-4 are the relative residuals of the solutions ob- 
tained from using a direct method, conjugate gradients 
(CG), and the proposed recursion method, respectively. 
Columns 5-7 are the corresponding computational times. 
The matrices were generated randomly. 



explicitly. Table 2 reports results for the larger problems where direct 
methods are no longer a viable option. For these larger problems, both 
the recursion method and CG compute accurate solutions. However, 
the proposed recursion method is, in general, approximately four times 
faster than the CG method. We note that these results are consistent 
with our previous results for diagonal matrices G (see [31 II])- 
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n 




Relative 


Residual 




Time 






CG 


Recursion 




CG 


Recursion 


50000 


6 


58e-16 


7.49e-16 





096941 





071781 


100000 


5 


61e-16 


7.45e-16 





170306 





117156 


200000 


6 


39e-16 


7.93e-16 





286275 





244446 


500000 


6 


67e-16 


8.07e-16 





900012 


1 


158381 


1000000 


6 


88e-16 


8.35e-16 


3 


901765 


2 


531363 


2000000 


6 


84e-16 


8.43e-16 


8 


316006 


6 


231911 



Table 4. Numerical results for large problems where 
the shift is a circuant matrix. Here, the size of the matrix 
varies between n = 100,000 and 2,000,000 (Column 1). 
Columns 2-3 are the relative residuals of the solutions 
obtained from using conjugate gradients (CG) and the 
proposed recursion method, respectively. Columns 4-5 
are the corresponding computational times. The matri- 
ces were generated randomly. 

Similar results are hold for shifts with circulant matrices. However, 
we notice that CG requires fewer iterations to converge than in the 
tridiagonal case. This is due to the fact that the matrix C7 C i rc is sig- 
nificantly more diagonally dominant than Gtm- (Recall that we choose 
the diagonal element to be greater than the sum of the off-diagonal 
row elements, which are all nonzero in general for the circulant case; 
whereas, in the tridiagonal case they are nonzero only in two places. 
Thus, the diagonal elements of L7 circ are not only identical, but they 
are also very large compared to the non-diagonal elements in magni- 
tude.) Thus, on these systems CG and recursion method are closer in 
computational time. In general, it is worth noting that in addition to 
computational-time benefits, the proposed recursion method does not 
suffer from potential stagnation of CG and other iterative methods. 

6. Conclusion 

In this paper, we proposed a direct method for solving shifted L- 
BFGS systems that arise in unconstrained and constrained optimiza- 
tion. The recursion formula is not only able to handle very large prob- 
lems (n = O(10 6 )) for which other direct methods such as Gaussian 
elimination fail, it is also very fast, being very competitive with con- 
jugate gradient methods. It is memory efficient since matrices are not 
explicitly stored. Most importantly, it is provably stable. Future work 
in this field includes extensions to other quasi-Newton methods such as 
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the symmetric rank-1 and DFP updates as well as non-positive-definite 
shifts. 
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